ROTEIN structure prediction from the amino acid sequence is an fundamental and challenging problem in molecular biology. Stimulated by the difficulty of the overall structure prediction, computational methods for the prediction
نویسنده
چکیده
State-of-the-art methods for secondary structure (Porter, Psi-PRED, SAM-T99sec, Sable) and solvent accessibility (Sable, ACCpro) predictions use evolutionary profiles represented by the position specific scoring matrix (PSSM). It has been demonstrated that evolutionary profiles are the most important features in the feature space for these predictions. Unfortunately applying PSSM matrix leads to high dimensional feature spaces that may create problems with parameter optimization and generalization. Several recently published suggested that applying feature extraction for the PSSM matrix may result in improvements in secondary structure predictions. However, none of the top performing methods considered here utilizes dimensionality reduction to improve generalization. In the present study, we used simple and fast methods for features selection (t-statistics, information gain) that allow us to decrease the dimensionality of PSSM matrix by 75% and improve generalization in the case of secondary structure prediction compared to the Sable server. Keywords—secondary structure prediction, feature selection, position specific scoring matrix
منابع مشابه
ROTEIN structure prediction from the amino acid sequence is an fundamental and challenging problem in molecular biology. Stimulated by the difficulty of the overall structure prediction, computational methods for the prediction
State-of-the-art methods for secondary structure (Porter, Psi-PRED, SAM-T99sec, Sable) and solvent accessibility (Sable, ACCpro) predictions use evolutionary profiles represented by the position specific scoring matrix (PSSM). It has been demonstrated that evolutionary profiles are the most important features in the feature space for these predictions. Unfortunately applying PSSM matrix leads t...
متن کاملProtein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملComputational Prediction of the Effects of Single Nucleotide Polymorphisms of the Gene Encoding Human Endothelial Nitric Oxide Synthase
ABSTRACT Background and Objective: Genetic variations in the gene encoding endothelial nitric oxide synthase (eNOS) enzyme affect the susceptibility to cardiovascular disease. Identification of the way these changes affect eNOS structure and function in laboratory conditions is difficult and time-consuming. Thus, it seems essential to ...
متن کاملIn silico Prediction and Docking of Tertiary Structure of LuxI, an Inducer Synthase of Vibrio fischeri
Background: LuxI is a component of the quorum sensing signaling pathway in Vibrio fischeri responsible for the inducer synthesis that is essential for bioluminescence. Methods: Homology modeling of LuxI was carried out using Phyre2 and refined with the GalaxyWEB server. Five models were generated and evaluated by ERRAT, ANOLEA, QMEAN6, and Procheck. Results: Five refined models were gener...
متن کاملApplication of Genetic Programming to Modeling and Prediction of Activity Coefficient Ratio of Electrolytes in Aqueous Electrolyte Solution Containing Amino Acids
Genetic programming (GP) is one of the computer algorithms in the family of evolutionary-computational methods, which have been shown to provide reliable solutions to complex optimization problems. The genetic programming under discussion in this work relies on tree-like building blocks, and thus supports process modeling with varying structure. In this paper the systems containing amino ac...
متن کامل